28 research outputs found

    Volume Comparison with Integral Bounds in Lorentz Manifolds

    Get PDF

    Learning Large-Scale Bayesian Networks with the sparsebn Package

    Get PDF
    Learning graphical models from data is an important problem with wide applications, ranging from genomics to the social sciences. Nowadays datasets often have upwards of thousands---sometimes tens or hundreds of thousands---of variables and far fewer samples. To meet this challenge, we have developed a new R package called sparsebn for learning the structure of large, sparse graphical models with a focus on Bayesian networks. While there are many existing software packages for this task, this package focuses on the unique setting of learning large networks from high-dimensional data, possibly with interventions. As such, the methods provided place a premium on scalability and consistency in a high-dimensional setting. Furthermore, in the presence of interventions, the methods implemented here achieve the goal of learning a causal network from data. Additionally, the sparsebn package is fully compatible with existing software packages for network analysis.Comment: To appear in the Journal of Statistical Software, 39 pages, 7 figure

    Learning nonparametric latent causal graphs with unknown interventions

    Full text link
    We establish conditions under which latent causal graphs are nonparametrically identifiable and can be reconstructed from unknown interventions in the latent space. Our primary focus is the identification of the latent structure in measurement models without parametric assumptions such as linearity or Gaussianity. Moreover, we do not assume the number of hidden variables is known, and we show that at most one unknown intervention per hidden variable is needed. This extends a recent line of work on learning causal representations from observations and interventions. The proofs are constructive and introduce two new graphical concepts -- imaginary subsets and isolated edges -- that may be useful in their own right. As a matter of independent interest, the proofs also involve a novel characterization of the limits of edge orientations within the equivalence class of DAGs induced by unknown interventions. These are the first results to characterize the conditions under which causal representations are identifiable without making any parametric assumptions in a general setting with unknown interventions and without faithfulness.Comment: To appear at NeurIPS 202

    A super-polynomial lower bound for learning nonparametric mixtures

    Full text link
    We study the problem of learning nonparametric distributions in a finite mixture, and establish a super-polynomial lower bound on the sample complexity of learning the component distributions in such models. Namely, we are given i.i.d. samples from ff where f=i=1kwifi,i=1kwi=1,wi>0 f=\sum_{i=1}^k w_i f_i, \quad\sum_{i=1}^k w_i=1, \quad w_i>0 and we are interested in learning each component fif_i. Without any assumptions on fif_i, this problem is ill-posed. In order to identify the components fif_i, we assume that each fif_i can be written as a convolution of a Gaussian and a compactly supported density νi\nu_i with supp(νi)supp(νj)=\text{supp}(\nu_i)\cap \text{supp}(\nu_j)=\emptyset. Our main result shows that Ω((1ε)Cloglog1ε)\Omega((\frac{1}{\varepsilon})^{C\log\log \frac{1}{\varepsilon}}) samples are required for estimating each fif_i. The proof relies on a fast rate for approximation with Gaussians, which may be of independent interest. This result has important implications for the hardness of learning more general nonparametric latent variable models that arise in machine learning applications

    DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization

    Full text link
    The combinatorial problem of learning directed acyclic graphs (DAGs) from data was recently framed as a purely continuous optimization problem by leveraging a differentiable acyclicity characterization of DAGs based on the trace of a matrix exponential function. Existing acyclicity characterizations are based on the idea that powers of an adjacency matrix contain information about walks and cycles. In this work, we propose a fundamentally different\textit{fundamentally different} acyclicity characterization based on the log-determinant (log-det) function, which leverages the nilpotency property of DAGs. To deal with the inherent asymmetries of a DAG, we relate the domain of our log-det characterization to the set of M-matrices\textit{M-matrices}, which is a key difference to the classical log-det function defined over the cone of positive definite matrices. Similar to acyclicity functions previously proposed, our characterization is also exact and differentiable. However, when compared to existing characterizations, our log-det function: (1) Is better at detecting large cycles; (2) Has better-behaved gradients; and (3) Its runtime is in practice about an order of magnitude faster. From the optimization side, we drop the typically used augmented Lagrangian scheme, and propose DAGMA (Directed Acyclic Graphs via M-matrices for Acyclicity\textit{Directed Acyclic Graphs via M-matrices for Acyclicity}), a method that resembles the central path for barrier methods. Each point in the central path of DAGMA is a solution to an unconstrained problem regularized by our log-det function, then we show that at the limit of the central path the solution is guaranteed to be a DAG. Finally, we provide extensive experiments for linear\textit{linear} and nonlinear\textit{nonlinear} SEMs, and show that our approach can reach large speed-ups and smaller structural Hamming distances against state-of-the-art methods.Comment: To appear at NeurIPS 202
    corecore